法国专利FR3087038A1 METHOD FOR TRACKING VISIBLE ELEMENTS OF INTEREST IN A VIDEO

专利PDF首页>>法国专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
The present invention relates to a method for monitoring elements of interest visible in a video consisting of a sequence of K images, characterized in that it comprises the implementation by data processing means (21) of a terminal (2), of steps of: (a) association of each element of interest of a first category visible in a k-th image of said video with an element of interest of a second category different from the first category visible in said k-th image; (b) Calculation of a cost of associating a plurality of pairs of an element of interest of the first category visible in at least one image of the video with an element of interest of the second category visible in at at least one image of the video, as a function of at least the counters of association of the pairs of an element of interest of the first category with an element of interest of the second category; (c) Implementation of a combinatorial optimization algorithm as a function of the association costs calculated so as to reassociate each element of interest of the first category visible in said k-th image with an element of interest of the second category visible in said k-th image; (d) Updating association counters.
公开号:FR3087038A1
申请号:FR1859158
申请日:2018-10-03
公开日:2020-04-10
发明作者:Cecile Jourdas；Dora CSILLAG；Maxime Thiebaut
申请人:Idemia Identity and Security France SAS；
IPC主号:

专利说明:

METHOD FOR TRACKING VISIBLE ELEMENTS OF INTEREST IN A VIDEO
GENERAL TECHNICAL AREA
The present invention relates to the field of supervised learning, and in particular to a method of monitoring elements of interest visible in a video, in particular by using a convolutional neural network.
STATE OF THE ART
Neural networks are widely used for the classification of data.
During a machine learning phase (generally supervised, that is to say on a reference database already classified), a neural network "learns" and becomes all alone capable of applying the same classification to unknown data.
Convolutional neural networks, or CNNs (Convolutional Neural Networks) are a type of neural network in which the connection pattern between neurons is inspired by the visual cortex of animals. They are thus particularly suitable for a particular type of task which is image analysis, they effectively allow the recognition of elements of interest such as objects or people in images, in particular in applications security (automatic surveillance, threat detection, etc.). For this, we learn CNN on the basis of learning images, that is to say images in which the elements of interest have already been "annotated", that is to say highlighted and labeled with the corresponding item category.
We are particularly aware of a use of CNNs for what is called "tracking", that is to say the temporal tracking of these elements of interest such as people or objects.
More precisely, the objective is to analyze a sequence of successive images (typically frames of a video, for example acquired by a surveillance camera), and to identify the same element present between several images so as to follow its displacement. Each identified element (for example each face) is assigned a unique identifier (typically an integer) common to all the frames. Thus, one can for example recognize a suspect by his face, and follow step by step his movement in a crowd.
Today, these techniques bring satisfaction but can be improved. In particular, there is a problem of "association" of elements between them when one is a sub-part of the other. For example, it seems obvious that each person is associated in a unique and permanent way with a face, or similarly that each vehicle is associated in a unique and permanent way with a license plate, but this link is sometimes complex to maintain.
A first intuition is in fact to start from the principle (in the face / person example) that a face detection must be included in the person's box, and therefore to associate a face with a person detection that contains this face detection.
This solution is not reliable, because a detection of a pedestrian (the bounding box) can "contain" several faces depending on the context and the disposition of people. For example, with reference to Figures 1a-1b (which represent two successive frames of the same scene), an adult can hold a baby in his arms, and the adult's box then contains two close face detections.
Thus, on the one hand the algorithm does not know which one to associate with the detection of person, and on the other hand this association can be disturbed over time. In the example of FIG. 1a, we have the correct face associated with the correct person under the identifier 308, and the baby's face has the identifier 311, but when the nose of this person enters the face box of the baby (case of Figure 1b), we observe a change in the association face-to-face: the face of the baby is assigned the identifier 308 of the adult while the face of the adult no longer has any associated person and is assigned the new identifier 316. Even if the association is reestablished a few images later, we are left with more identifiers created than real elements (we no longer know which association to believe), which distorts everything tracking.
It would therefore be desirable to have a new solution of association between an element and a sub-part of this element which is simple, reliable and universal, and this without additional cost of computation time.
PRESENTATION OF THE INVENTION
According to a first aspect, the present invention relates to a method for monitoring visible elements of interest in a video consisting of a sequence of K images, characterized in that it comprises the implementation by data processing means of a terminal, steps of:
(a) Association of each element of interest of a first category visible in a k-th image of said video with an element of interest of a second category different from the first category visible in said k-th image;
(b) Calculation of a cost of associating a plurality of pairs of an element of interest of the first category visible in at least one image of the video with an element of interest of the second category visible in at at least one image of the video, as a function of at least the counters of association of the pairs of an element of interest of the first category with an element of interest of the second category;
(c) Implementation of a combinatorial optimization algorithm as a function of the association costs calculated so as to reassociate each element of interest of the first category visible in said k-th image with an element of interest of the second category visible in said kth image;
(d) Updating association counters.
According to other advantageous and non-limiting characteristics:
• The process is repeated iteratively for each image k e [1; Kl d® I® video;
• two elements of associated interests are considered to be part of the same entity;
• one of the first and the second category of element of interest is a sub-part of the other;
• either one of the first and second category is the face category and the other is the person category, or one of the first and second category is the license plate category and the other is the vehicle category or a subcategory of the vehicle category;
• said combinatorial optimization algorithm is the Hungarian algorithm;
• each element of interest is referenced with an identifier, two associated elements of interest being referenced with the same identifier;
• step (a) comprises the detection of at least one element of interest from a first category visible in said image and from at least one element of interest from a second category different from the first category visible in said image, by means of at least one convolutional neural network, CNN; and associating each element of interest of the first category detected with an element of interest of the second category detected;
The method comprises the prior implementation of a method for learning parameters of said CNN by data processing means of at least one server, for detection of elements of interest visible in images, from '' at least one training image base in which said elements of interest as well as characteristic geometric structures are already annotated, the CNN comprising an encoding layer for the generation of a vector for representing the elements of interest detected, said representation vector comprising, for at least the first category of element of interest, at least one descriptive value of at least one geometric structure characteristic of said first category of element of interest;
• said representation vector comprises two position values and a visibility value of the at least one geometric structure characteristic of said given category of element of interest;
• said characteristic geometric structure is a characteristic point;
• said representation vector comprises descriptive values of at least three geometric structures characteristic of said first category of element of interest;
• the second category is a sub-part of the first category, the geometric structures characteristic of the first category of element of interest being also geometric structures characteristic of the second category of element of interest;
• the method further comprises the detection, for each element of interest of the first category detected, of one or more geometric structures characteristic of said first category of element of interest visible in said image;
The method comprises the calculation, for each pair of a first element of interest of the first category detected and of a second element of interest of the second category detected, of a recovery score between a box of said second element and the geometric structure or structures characteristic of said first category of element of interest for the first element;
• The association is carried out using a combinatorial optimization algorithm based on the calculated recovery scores;
• the method includes the prior implementation of a method for learning parameters of a convolutional neural network, CNN, by data processing means from at least one server, for detection of elements of interest visible in images, the method being characterized in that it is implemented from a plurality of learning image databases in which said elements of interest are already annotated, the CNN being a common CNN to said plurality of learning image bases, and having a common core and a plurality of encoding layers each specific to one of said plurality of learning image bases;
• each encoding layer is a convolution layer, in particular with 1x1 size filters or a fully connected layer, generating a vector for representing the elements of interest detected;
Each learning image base is associated with a set of categories of element of interest, the elements of interest detected in the images of a base being those belonging to a category of said set associated with the base, said sets of categories being different from one base to another;
Said plurality of learning image bases comprises at least a first base, a second base and a third base, the set of categories of element of interest associated with the first base comprising the face category, the set of interest element categories associated with the second base comprising the person category, and the set of interest element categories associated with the second base comprising the vehicle category or at least one subcategory of the vehicle category ;
• an association cost is calculated in step (b) for each pair of an element of interest of the first category detected in the k-th image with an element of interest of the second category already detected in at at least one image of the video such that the association counter of said pair is not zero, and for each pair of an element of interest of the second category detected in the k th image with an element of interest of the first category already detected in at least one image of the video such that the association counter of said pair is not zero;
If step (a) comprises the detection in the k-th image of at least a first element of interest, a second element of interest and a third element of interest, the first element of interest being the first category and the second and third elements of interest being of the second category, and if the first element of interest is associated with the second element of interest, step (c) comprises depending on the result of the setting implementation of said combinatorial optimization algorithm:
- Or maintaining the association of the first element of interest with the second element, the association counter of the first element of interest with the second element of interest then being incremented in step (d);
- Or the reassociation of the first element of interest with the third element of interest in place of the second element of interest, the counter for association of the first element of interest with the third element of interest then being incremented in l 'step (d);
• if step (a) comprises the identification in the k-th image of at least a first element of interest and a second element of interest, but not that of a third element of interest, the first element of interest being of the first category and the third element of interest being of the second category, and if the first element of interest is associated with the second element of interest, step (c) comprises depending on the result the implementation of said combinatorial optimization algorithm:
- Or maintaining the association of the first element of interest with the second element, the association counter of the first element of interest with the second element of interest then being incremented in step (d);
- Or the reassignment of the identifier of the third element of interest to the second element of interest, the association counter of the first element of interest with the third element of interest then being incremented in step (d);
• a non-incremented association counter is decremented in step (d);
• the cost of association of the i-th element of interest of the first category visible in said k-th image with the j-th element of interest of the second category visible in said k-th image, is obtained at l 'step (b) by one of the following formulas: = 1 - C _i} = Σΐ ^ ac _ü + Σΐίι ac _tj , C _i} = 1 2jî = o ^uc ii ^l 2ji ₌ o ^uc lj —— ² * ^aCi J * ^sdi * ^sd J ---- _{or c} = ΣΤ ^ scLac ,, + y ^ iSdjacij; with n the number ^ sdiacu + ^ sdjacu ^l J ¹ J 'of elements of interest of the first category and m the number of elements of interest of the second category visible in said k-th image, and sc ^ respectively the detection scores of said i-th element of interest of the first category and of said j-th element of interest of the second category.
According to a second and a third aspect, the invention provides a computer program product comprising code instructions for the execution of a method according to the first aspect of tracking elements of interest visible in a video; and computer readable storage means on which a computer program product includes code instructions for performing a method according to the first aspect of tracking items of interest visible in a video.
PRESENTATION OF THE FIGURES
Other characteristics and advantages of the present invention will appear on reading the following description of a preferred embodiment. This description will be given with reference to the appended drawings in which:
- Figures 1a and 1b show two examples of association of elements of interest in video images using a known method;
- Figure 2 is a diagram of an architecture for the implementation of the methods according to the invention;
- Figure 3 illustrates the steps of a preferred embodiment of a monitoring method according to the invention
- Figures 4a-4b show two cases of incorrect association / identification of elements of interest and how the monitoring method according to the invention solves these cases;
- Figure 5 shows an example of CNN architecture for the implementation of an embodiment of a detection method of the invention;
- Figure 6 schematically illustrates the implementation of a learning method according to a preferred embodiment of the invention;
- Figure 7a shows an example of representation vector generated during the implementation of an association method according to a preferred embodiment of the invention;
- Figure 7b shows an example of association of elements of interest in an image using an association method according to a preferred embodiment of the invention.
DETAILED DESCRIPTION
Notions
With reference to FIG. 3 which will be described later, according to several complementary aspects the present invention may involve:
- a method of learning a convolutional neural network (CNN);
- a method of detecting elements of interest visible in an image;
- a method of associating elements of interest visible in an image;
- a process for monitoring elements of interest visible in a video made up of a sequence of K images (i.e. frames).
Here, the term “element of interest” designates any representation in the image / video of an entity whose detection / association / tracking is desired in an image / video. Each element of interest is of a given category, corresponding to a type in the semantic sense. For example, the categories person, face, vehicle, license plate, etc. can be considered: the vehicle category covers all of all vehicles (car / truck / bus, etc.) regardless of model, color, etc., the license plate category covers all of all plates whatever the country / region of issue, color, etc.
"Detection", or "recognition" is the most basic operation, and refers to the simple marking of an element of interest of a known category in an image. Detection thus combines localization (determination of the position and size of a box including the element of interest, known as a detection box) and classification (determination of its category).
By “follow-up” is meant as explained before the “tracking” of these elements of interest during the duration of the video, that is to say the continuous identification of the same element detected from image to image where it is present so as to determine the displacement of the corresponding entity over time.
For this, we refer to each element of interest with an identifier, the set of occurrences of an element of interest for a given identifier being called a "track".
We will make the difference between "detection" and "identification": while detection is done image by image, and does not distinguish between the different elements of the same category, identification affects the detections the good identifiers so that two detections of the same entity on two different images have the same identifier, ie be part of the same track. For example, supposing that on a first image are identified as “person 1” and “person 2” two elements of category person and that two elements of category person are detected in a second image again, the identification makes it possible to determine in the second image which is person 1/2 (or even person 3)
Identification can in other words be seen as the matching of a detected element with an entity, that is to say the distinction between the different elements of the same category detectable in one or more images.
We understand that in a "perfect" tracking an entity should be identified in a unique and constant way by the same element of interest, ie there should be an exact correspondence between element of interest and entity, but in practice an entity can be associated over time with several elements of interest (constituting duplicates), or even an element of interest can change entity (confusion of two entities), see below. These are inaccuracies in tracking that the present process effectively resolves.
By "association" is meant the matching of two items of interest from different but related categories.
A first element of interest of a first category and a second element of interest of a second category can be associated if they have a link, in particular if they are part of the same entity. In general, two elements of interest of different associated categories are referenced by the same identifier, i.e. there is a unique identifier per entity, as is the case in the examples in Figures 1a, 1b.
In the following description, we will consider the preferred embodiment of an association of a "subpart" nature. In other words, one of the first and the second category of element of interest is a sub-part of the other, i.e. part of it. Arbitrarily, the present description considers the second category to be a sub-part of the first category, but the reverse could naturally be considered. In one example, the second category is the face category and the first category is the person category. In another example, the second category is the license plate category and the first category is the vehicle category.
Note that the invention will not be limited to an association of a sub-part nature, and one could for example consider an association of categories themselves sub-parts of a third category (for example a face-hand association).
We can even consider cases of association where there is no party / sub-party relationship, whether directly or indirectly, for example person and background.
In a case where there are more than two categories of element of interest which can be associated (for example, person / face / hand), in particular a first category, a second category and a third category, it suffices to define a main category (the “part”) and secondary categories (the “sub-parts”), and each secondary category will be associated with the main category. For example, if there is person / face / hand, each hand will be associated with a person and each face will be associated with a person, but we will not try to associate hands and faces (insofar as this association is known by transitivity from the other two).
In the context of the present invention, the aim is more precisely the simultaneous tracking of at least two categories of elements of interest, in particular so as to associate them over time.
Again, it will be understood that in a “perfect” tracking two elements of interest of different categories should be associated if and only if they are part of the same entity, but in practice and in particular in the event of error in the identification, elements of interest may be associated when they actually correspond respectively to two different entities, or on the contrary elements of interest corresponding to the same entity may not be associated. The present method also solves these problems.
Figures 4a and 4b represent two examples of inaccurate associations, and how they are "corrected" through the implementation of this monitoring process, which will be described later.
In the figures, the elements of interest referenced Pi (P _lt P ₂ and P ₄ ) are of person category ("first" category), and the elements of interest referenced Fj (Fi, F ₂ and F ₃ ) are of face category ("second" category). It is assumed that P ₄ and P ₄ are elements of elements of interest forming duplicates (they identify the same first person), and P ₂ validly identifies a second person. It is also assumed that F _lt F ₂ and F ₃ validly identify the respective faces of the first, second and third persons.
In Figure 4a, we have on one side an exact association of P ₄ with F ₄ (the first person and his face), and on the other side an inexact association of P ₂ with F ₃ instead of F ₂ ( the face of the third person is associated with the second person). There are no problems in the tracks but the association is to be improved.
In Figure 4b, this time we have on one side an exact association of P ₂ with F ₂ (the second person and his face), and on the other side an association of P ₄ with F ₄ . This last association is not completely inaccurate since P ₄ identifies the first person, however the first person should be identified by the “original” P ₄ track and not the duplicate P ₄ .
The present methods are implemented within an architecture as represented by FIG. 2, thanks to one or more servers 1a, 1b, 1c and a terminal 2. As will be seen, the method can include learning of one or more convolutional neural networks, CNN, and where appropriate the server or servers 1a, 1b, 1c are associated learning devices. The terminal 2 is in turn a piece of equipment proper (that is to say, implementing all or part of the present method), for example a video surveillance data processing equipment.
In all cases, each piece of equipment 1a, 1b, 1c, 2 is typically a remote computer piece of equipment connected to a wide area network 10 such as the internet network for the exchange of data. Each comprises processor-type data processing means 11a, 11b, 11c, 21, and data storage means 12a, 12b, 12c, 22 such as a computer memory, for example a disk.
At least one of the possible servers 1a, 1b, 1c stores a training database, ie a set of training images, that is to say on which elements of interest have already been annotated and labeled with the corresponding element category (as opposed to the so-called input video on which we are trying to track). Preferably, there are at least two, or even at least three, learning image databases, stored on as many separate servers (example of two servers 1a and 1b in FIG. 2).
In FIG. 2, the server 1c is an optional server which does not have a learning image base and which implements obtaining the CNN (s) from the databases of the servers 1a, 1b . The role of this server 1c can however be completely fulfilled by either of the servers 1a, 1b.
CNN
A CNN generally contains four types of layers which process information successively:
- the convolution layer which deals with the blocks of the entry one after the other;
- the non-linear layer which makes it possible to add non-linearity to the network and therefore to have much more complex decision functions;
- the pooling layer (called "pooling") which makes it possible to group several neurons into a single neuron;
- the fully connected layer which connects all the neurons of a layer to all the neurons of the previous layer.
The NL non-linear layer activation function is typically the ReLU (Rectified Linear Unit) function which is equal to f (x) = max (0, x) and the pooling layer (denoted POOL) the most used is the MaxPool2 * 2 function which corresponds to a maximum between four values of a square (we pool four values into one).
The convolution layer, denoted CONV, and the fully connected layer, denoted FC, generally correspond to a scalar product between the neurons of the previous layer and the weights of the CNN.
Typical CNN architectures stack a few pairs of CONV -> NL layers and then add a POOL layer and repeat this scheme [(CONV -> NL) ^P -> POOL] until you get a sufficiently small output vector, then finish by one or two fully connected FC layers.
In image analysis, there are not always NL non-linear layers or even fully connected FC layers.
A person skilled in the art may for example refer to the CNNs described in the documents YOLO9000: Better, Faster, Stronger - Joseph Redmon, Ali Farhadi, https: // arxiv. org / abs / 1612.08242, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, https://arxiv.org/abs/1506.01497, or their derivatives.
In the rest of this description, we will take in particular an example of CNN based on the “Darknet-19” architecture represented in FIG. 5 and described in the document YOLO9000, which comprises 19 layers of CONV convolution, 5 layers of pooling. MaxPool2x2 (Alternatively, we can also cite the “Darknet-53” version, with 53 layers of CONV convolution, or any architecture of the VGG, RESNET, DENSNET type, etc.).
Using the example of Darknet-19, we can build a detection CNN by taking its common core (ie the part extending to the double horizontal line in Figure 5) and possibly adding three convolution layers with 1024 filters of size 3x3, and above all a last convolution layer CONV advantageously having filters of size 1x1, which acts as a so-called “encoding” layer and has an output of size C (ie has a number of filters equal to the size output representation vector, see below). Alternatively, a fully connected FC layer can be used as the encoding layer.
Indeed, the objective of a CNN-based method is to describe as precisely as possible and repeatedly the content of an image in the form of a vector containing all the information of the elements to be detected, this is what that the encoding layer allows. Thus, the encoding layer generates a vector for representing the elements of interest detected.
The image is spatially divided into S cells (for example 7x7 cells), each cell has B 'description boxes' (typically, B = 2 or 3), which indicate the presence of one or more elements of interest (up to B elements) in this cell, and thus constitute “candidate” detection boxes. The right detection box is the one that best incorporates (that is, as closely as possible) the corresponding element of interest.
The presence of an entity in a description box translates at least by the data of its position in the cell and of its category, coded in the form of the vector called "representation" of C values generated by the encoding layer. . Generally, the vector includes at least five values:
- x / y coordinates of the center of the description box (as a fraction of the cell size);
- Length / width w / h of the description box (as a fraction of the cell size);
- Identifier c of the category of the element of interest
The total description code of an image is the concatenation of all the vectors representing the description boxes, ie of length S * B * C.
With reference to FIG. 7a, at least certain representation vectors (those for an element of interest of a given category, for example people) are elongated, ie are given concatenated descriptive values of at least one characteristic geometric structure of said category, in particular said descriptive values advantageously comprise at least two position values (coordinates KPx / KPy) and / or a visibility value (boolean value KPv). In a particularly preferred manner, there are at least three characteristic geometric structures, that is to say at least nine additional descriptive values, as will be seen below.
Thus, the detection of the characteristic points is accomplished simultaneously with the detection of the elements of interest, without additional cost of time and without degradation of the performances.
By "characteristic geometrical structure" is meant in particular a characteristic point (in English "keypoint"), but also a shape such as a polygon, a mesh, etc. and in general any graphic object easily identifiable on all the elements of this category. In a particularly preferred way, when one of the first category and of the second category is a sub-part of the other, we choose characteristic geometric structures common to the first category and to the second category. In the face / person example, we could take for example the two eyes and the nose: indeed, these are geometric structures of a very particular shape, characteristic of both a person and a face .
The use of these characteristic "common" geometric structures allows very cleverly as we will see later the association of elements of two categories in the image, and this in a very reliable way.
In the following description, we will take the example in which said geometric structures are points, and for example we will annotate the eyes or nose of a face as points.
Advantageously, other information on this code can be encoded on the representation vector (other bounding boxes, information on the action in progress, a vehicle registration plate number, etc.)
Learning process
Advantageously, a method of learning parameters of at least one convolutional neural network, CNN, is implemented for detection of visible elements in images, from a plurality of basic learning images. in which said elements are already annotated, ie located and classified (the category is determined). Each image base is in fact advantageously associated with a set of categories of element of interest, the elements of interest annotated in the images of a base being those belonging to a category of said set of categories associated with the base.
As explained before, for the elements of at least one given category, one or more characteristic geometric structures can already be annotated, i.e. their coordinates in the known image. It is understood that the characteristic geometric structures are not always visible and therefore are only indicated if they are visible. For example, a person in profile can be detected as a person category element (his face too), but his left or right eye will not be visible because behind the head.
According to a first conventional mode, each CNN is learned from a single learning base, for a subset of the set of categories associated with this base (even a single category, and in particular the first or second category). In other words, it learns to recognize one or more of the categories of elements already annotated in the learning images of this database.
It is not possible to merge two learning databases because they are "partially" annotated relative to each other. For example if we consider a base of people and a base of vehicles, the vehicles are not annotated in the base of people and vice versa, which constitutes false negatives which would completely disrupt learning. It would be necessary to manually add the missing annotations, which is a titanic work. For example, the MS-COCO database (the most used) which contains only annotations of people, some animals and some objects, but no annotation of faces. We also cite for example the WIDER database, which contains only face annotations.
In addition, if one of the first and second category of element of interest is a sub-part of the other, even by creating an ad hoc base it would not be possible to learn simultaneously from a CNN to detect the two categories due to their inclusion.
Thus, if we are in a case where the first category is in the set of categories of a first base and the second category is in the set of categories of a second base, we learn two CNNs which constitute two “detectors »Independent.
For example, on the one hand, we can learn a face category element detector from a first base associated with a set of categories of element of interest comprising the face category (typically directly by the processing means 11a of the first server 1a if it is the one which stores the first base), and on the other hand a person category element detector from a second base associated with a set of element element categories of interest comprising the person category (typically directly by the processing means 11b of the second server 1b if it is the one which stores the second database).
Note that limiting the number of different categories detectable by a CNN therefore reduces the necessary size of the output representation vector.
According to a second preferred embodiment, the problem of the incompatibility of the different bases is cleverly circumvented so as to have at least one common CNN learned directly from a plurality of bases of learning images, and this in one only learning. This is advantageously achieved by the data processing means 11c of the server 1c connected to the other servers 1a, 1b of the databases. Said CNN is said to be "common" to several databases (in other words, there is only one CNN which learns at the same time from several databases), as opposed to conventional CNNs which can only learn from one base.
With reference to FIG. 6, said plurality of learning image databases advantageously comprises at least a first learning image base (in which at least the elements of interest of the first category are already annotated) and a second base (in which at least the elements of interest of the second category are already annotated), or even a third base.
In particular, the set of categories of element of interest associated with the first base comprises the category person (the first category in the examples), the set of categories of element of interest associated with the second base comprises the category face category (the second category), and the set of categories of element of interest associated with the possible third base comprises one or more categories of inanimate objects, such as the vehicle category or at least one vehicle subcategory (for example, the seven categories car, truck, bus, two-wheeler, bicycle, plane and boat). We understand, however, that we are not limited to any choice of bases / categories.
For this, a CNN having a common core and a plurality of encoding layers each specific to one of said plurality of learning image bases is used as the common CNN.
In other words, as seen in Figure 6, the CNN architecture does not have an encoding layer common to all of the modalities (i.e. the different sets of categories) , but an encoding layer specific to some of the modalities.
In a particularly preferred manner, said common core comprises all of the layers having variable parameters other than the encoding layer, and in particular the start at the start. In the example in Figure 5, the common core extends to the horizontal double line.
In other words, assuming that we have three bases of learning images as in the example in Figure 6, then we have three layers of encoding and for each learning image taken as input we use the encoding layer corresponding to the base from which the learning image comes.
It is therefore understood that all the training images participate in the learning of the common core, but that only the images of a base participate in the training of each encoding layer.
The various encoding layers are, as explained, each advantageously composed of a convolution layer with filters preferably of size 1x1, and whose output size C (the number of filters) corresponds to the size of the representation vector (typically 8 for people and faces, and 14 for vehicles if there are 7 subcategories as in the example above, plus said descriptive values of at least one characteristic geometric structure for at least one of between them). The various encoding layers are typically arranged in parallel.
In addition, advantageously, as shown in FIG. 6, is used a plurality of cost functions again each specific for one of said plurality of learning image bases.
It is recalled that a cost function (called "loss", that is to say loss) specifies how learning CNN penalizes the difference between the expected and actual signal. More precisely, for an input data (learning image), the cost function makes it possible to quantify an "error" between the output obtained by the CNN (the elements detected) and the theoretical output (the annotated elements). The learning aims to modify the parameters of the CNN so as to gradually decrease the error as calculated by the cost function. We know for example the Softmax function (or normalized exponential function), or even the Huber function, standards such as L1, etc.
To carry out the actual learning, the classic technique called backpropagation of the gradient propagates "backwards" the calculated error so as to update the parameters of all the layers.
In this embodiment, different cost functions are used to do this depending on the base from which each learning image comes. More precisely, it is randomly drawing iteratively learning images from the plurality of bases (ie each image can be from any base), and the weights and parameters of the CNN are varied for each based on the cost function corresponding to the base from which it comes.
In a particularly preferred manner, a so-called "batch" learning paradigm is implemented, that is to say that for a set of learning images originating from the various databases, we first calculate the errors (with the corresponding cost function) without updating the parameters, then we add these different errors, and when the set of images of said set has passed once through the CNN, we apply backpropagation throughout the CNN using the total error (summed).
The common CNN can be used as a "multi-category" detector when applied to the video images. Of course it is already possible to make multi-category detectors from a single database if it already has the elements of several annotated categories, but we are limited to these categories. The common CNN of this embodiment makes it possible to combine any learning database, and therefore to be completely free of multicategories.
It is understood that the fact of multiplying the encoding layers and the cost functions makes it possible, without significantly increasing the size of the network, not to have one detection penalized by another and to have the same efficiency as with a plurality of detectors. In addition, there is a significant saving of time in learning since it can be simultaneous for all bases.
Note that it is still entirely possible, if one wishes to detect a large number of different categories, to learn from other CNNs, whether they are each common to several databases or specific to a database . For example, or could have a first common CNN multi-category detector, and a second CNN detector dedicated to another complex category to identify learned consequently on a particular database.
Alternatively or in addition, learning of the CNN (s) can be implemented from at least one base of learning images in which are also already annotated characteristic geometric structures, in particular the characteristic geometric structures d '' at least one given category (the first category).
As explained before, the CNN then comprises an encoding layer for the generation of a vector for representing the elements of interest to be detected comprising for at least said first category of element of interest to be detected, at least one (advantageously three, in particular coordinated and visibility) descriptive value of at least one (advantageously three) characteristic geometric structure (in particular characteristic point) of said first category of element of interest. It will again be understood that all the characteristic geometric structures are not necessarily visible and that naturally only those which are can be detected. Thus, even if we try to detect three characteristic points, we will not necessarily succeed for all three (but we will then indicate which one or which are not visible).
Detection and association
The present invention relates in particular to a method of tracking visible elements of interest in a video made up of a sequence of K images, implemented by the data processing means 21 of the terminal 2.
With reference to FIG. 3, the present monitoring method begins with a step (a) of association of each element of interest of a first category visible in a k-th image of said video (frame) with an element d interest of a second category different from the first category visible in the kth image.
It is recalled that two associated elements are considered to be linked to, and in particular forming part of, the same entity, as explained above.
Note that it is always possible that an element of the first or second “orphan” category remains if there are not the same number of elements of the first and second categories having been detected, ie if for example the one with which an element should have been associated is hidden or that the detection did not work. For example, in Figure 1a, the baby is not detected (only his face is) and therefore we have an orphan face.
Each element of interest is advantageously referenced with an identifier, and preferably the association of two elements results in associating with the second the identifier of the first (i.e. both are referenced under the same identifier).
It is known to use associations to carry out monitoring, but as explained, monitoring can be distorted if a bad association occurs. Thus, the present method implements a correction of the associations frame by frame. Indeed, if an association for an image taken in isolation can be optimal, it may be inconsistent with the previous images. More precisely, we suppose an association obtained thanks to step (a), and this association will be studied
Preferably, the method is repeated iteratively for each image k e [1; Κί of I® video so as to carry out continuous monitoring, although it will be understood that it can be implemented only from time to time to correct the associations. In the present description, we will take the example of the sequence of images k and k + 1, but we can naturally transpose to any couple of successive images.
Preferably step (a) comprises the detection of at least one element of interest from a first category visible in said k th image and at least one element of interest from a second category visible in said k -th image.
Those skilled in the art can use any known technique to implement this detection, and in particular a CNN as explained, preferably trained by means of a learning process as described above.
According to a first embodiment, a CNN adapted for each category is used. In other words, each image is processed as many times as there are expected categories so as to detect all the elements of interest of all the categories.
According to the second mode, at least one common “multitype” CNN is used for all or part of the categories to be detected.
Then, each element of interest of the first category detected in said k-th image is associated with an element of interest of the second category detected in said k-th image. The association can be implemented in a conventional manner (typically by detection of inclusion of a second element in the first element), but in a particularly preferred manner, an innovative method of association of elements of interest will be used in an image, involving the CNN with elongated representation vector described above.
In this embodiment, at least one element of interest from a first category is detected, at least one element of interest from a second category different from the first, as well as for each element of interest from the first category. detected, the characteristic geometric structure or structures of said first category of element of interest (ie the characteristic geometric structure or structures associated with each element of the first category), visible in said image.
These are chosen such that the second category is a sub-part of the first category, and that the geometric structures characteristic of the first category of element of interest are also geometric structures characteristic of the second category of element d 'interest. As already explained, this is for example the case of the points of the face such as the nose and the eyes.
Then, following the detection, for each pair of a first element of interest of the first category detected and of a second element of interest of the second category detected, a recovery score is preferably calculated between a box of said second element and the geometric structure or structures characteristic of said first category of element of interest for the first element.
The clever idea is not to compare directly the elements of the first category and the elements of the second category, but starting from the principle that the second category is a sub-part of the first category and that the geometric structures characteristic of the first category of element of interest are also characteristic geometric structures of the second category of element of interest, directly compare the characteristic geometric structures of the first category with the elements of the second category: the characteristic geometric structures can be seen as a "second detection" of an element of the second category, which it is easy to match with it.
By “recovery score”, we mean any metric representative of the correspondence between a box and characteristic geometric structures, ie increasing when the characteristic geometric structures are increasingly included in the box.
According to a first embodiment, the recovery score of a pair of a first element with a second element can simply be equal to the number of characteristic geometric structures for the first element which are included in the box of the second element (possibly normalized by dividing by the total number of characteristic geometric structures). For example, in the example in Figure 1b, we have a recovery score of 1/3 with each of the faces of the mother and child since each of the corresponding boxes includes a point characteristic of the mother (in the like the nose or the left eye, the right eye not being visible).
According to a second mode, in particular if three characteristic geometric structures, and in particular characteristic points, are detected, the score is a recovery rate between said box of the second element and a convex envelope of the geometric structures characteristic of said first category of element of interest for the first element, that is to say a report based on the corresponding surfaces. We see in Figure 6b said convex envelope of three characteristic points of said first category of element of interest for the first element in an example image.
We can very advantageously use the Jaccard criterion, that is to say the relationship between the intersection (of the box and the convex envelope) and the union (of the box and the convex envelope), also called in English “Intersection Over Union” (IOU).
For example, by noting KP _icvx the convex envelope of the characteristic geometric structures for the i-th element of interest of the first category and Fj the j-th element of interest of the second category, then the recovery score is given by the formula scIOlE, · = —— ^r V FjUKP _icvx
Finally, a combinatorial optimization algorithm can be implemented as a function of the recovery scores calculated so as to associate each element of interest of the first category detected with an element of interest of the second category detected.
By combinatorial optimization algorithm (also called discrete optimization), we mean an algorithm capable of finding a global solution to the association problem, ie of finding the optimal combination of pairs among all possible combinations of pairs, optimal being understood in terms of “total cost”: we can base ourselves, for example, on a cost, for example 1 scIOUij (and generally any decreasing function: the higher the recovery score, the lower the association cost).
Many combinatorial optimization algorithms are known, and preferably the Hungarian algorithm will be used, which is particularly suitable for the present case (alternatively, the FordFulkerson algorithm may be cited, for example).
Note that we can always use a "naive" optimization algorithm in which we are content to associate with each element of the first category the element of the second category with which the recovery score is maximum, although very close boxes (typical case of faces) as well as inaccuracies in the detection of characteristic geometric structures can cause association errors.
Association cost
In a step (b), the data processing means 21 calculate a “cost of association” of each pair of an element of interest of the first category detected in at least one image with an element of interest of the second category detected in at least one image, as a function of at least the association counters of each pair of an element of interest of the first category with an element of interest of the second category.
It should be noted that if step (b) is typically implemented only for the elements of interest detected (already associated) in the current k-th image, it can very advantageously also include any elements detected in the previous image (k-1-th) but "disappeared" (not visible), or even all the elements of interest having at least one association counter with non-zero value (with one of the elements detected in the current k-th image), see below.
Preferably, an intermediate position is adopted by providing that an association cost is calculated for each pair of an element of a category detected in the k-th image with an element of interest of another category having with this element a non-zero association counter, ie an association cost is calculated for each pair of an element of the first category detected in the k-th image with an element of interest of the second category already detected in at least one image of the video such that the association counter of said pair is not zero, and for each pair of an element of the second category detected in the k-th image with an element of interest of the first category already detected in at least one image of the video such that the association counter of said pair is not zero.
By association counter of a pair of items of interest of two different categories means the number of times that the two items of this pair have been previously associated. The association counter of a pair is typically incremented for each image in which the association is obtained, and in the opposite case maintained or even advantageously decremented (the counter being always at least equal to 0). A “provisional” update of the counters can take place after the association of step (a).
We understand that there can be as many counter as possible pairs. In the example where we have four elements of interest from the first category referenced (person category), and three elements of interest from the second category referenced Fj _Jeil . ₃ ] (face category), there can be up to twelve association counters denoted actj {i, ₇ -} e [[ _1: 4] | x | [ _1; 3] | Note that preferably, since the vast majority of combinations will never exist, we only keep the counters of an association that took place at least once in the video.
In a case where there are more than two categories of element of interest that can be associated (for example, person / face / hand), we use a set of association counters by entity (in the example of the counters person / hand and person / face counters).
Association cost can be understood as the application of a cost function to association meters. It is representative of a "cost" necessary to make an association among all the associations, ie its difficulty, and makes it possible to express the counter of a pair with respect to all the counters. Thus, the lower the cost of association for a given pair, the more likely this association is to be the right one.
Those skilled in the art will be able to use any known cost function. According to a first embodiment, the association cost depends only on the association counters.
By noting Cÿ the cost of association of the i-th element of interest of the first category with the j-th element of interest of the second category (Fi - Pj), with n the number of elements of the first category and m the number of elements of interest of the second category, one can use for example the following formulas:
* acÿ ^{Cî; _ 1} Σ _{= ο} ac _ü + Σ ™ ₀ actj nm
Cij = 'CLCii +' CLCij l * j l * i
These formulas reflect a certain "inertia" of past associations. More specifically, we see that the cost to maintain a pair is always lower than the cost to create a new pair. The association counters act as a history and the use of the cost functions thus promotes stability.
According to a second embodiment, the association counters are weighted by detection scores (the detection scores of the i-th element of the first category and of the j-th element of the second category are noted sc, respectively), ie the association cost is a function (only) of association counters and detection scores. The following formulas can be used, for example:
* ac ,, * sd, * sdt
C = 1 -------------- ^lJ Σ = Ο sdi ac _a + Σ ™ ₀ ^sd j ^ac ij nm
Cij = 'sdiCLCii +' SdjCLCij l * j l * i
We can also alternatively use the recovery score as presented above, for example with the formulas:
* acij * scIOUij ^ClJ YÎ = o ^SCÎOU u ^ac u + Y ^ o ^sciou ij ^ac ij nm
Cÿ = ^^ scIOUuacu + scIOU ^ ac ^ l * j l * i
Weighting by scores can limit the effect of inertia and still favor the creation of a new pair in the event of particularly high scores.
Combinatorial optimization and update
In a step (c), the data processing means 21 implement a combinatorial optimization algorithm as a function of the association costs calculated so as to reassociate each element of interest of the first category detected in said k-th image with an element of interest of the second category detected in said k-th image. This is a "check" of the initial association of step (a), and it is entirely possible that step (c) only confirms it.
We can use here the same optimization algorithm as those described above for the preferred embodiment of the association of step (a), but not necessarily.
Preferably, the Hungarian algorithm will be used again, which is particularly suitable for the present case (alternatively, the Ford-Fulkerson algorithm could be cited, for example).
Preferably, step (c) further comprises reassigning identifiers to the elements of interest detected in said k-th image. Indeed, if again step (c) is typically implemented only for the elements of interest detected (already associated) in the current k-th image, at least the combinatorial optimization algorithm can very advantageously also include any elements detected in the previous image la (k-1-th) but "disappeared" (not visible), or even all the elements of interest having at least one association counter with non-zero value ( with one of the elements detected in the current k-th image), and in general all the elements of interest involved in a pair for which an association cost was calculated in step (b).
Indeed, so naturally we can only associate visible elements, it is possible that at the end of step (c), the combinatorial optimization algorithm will match a visible element with another yet "more visible ”. This situation means that one of the visible elements of interest (in this case the one associated with the end of step (a) is a duplicate with the other element of interest which is not visible). We will see an example of such a case below and how it works.
In a final step (d), the association counters are updated according to the result of the associations. This may take into account a possible provisional update following step (a).
In this update, at least the association counters of the pairs obtained (created or maintained) are incremented, and advantageously those (still non-zero) of the pairs not obtained (in particular abandoned) are decremented so as to amplify the 'effect. As explained above, the counters do not descend negatively, and therefore all "nuisance" counters are ignored.
Correction of association
A case of "bad association" can occur if for example the face of a person is misplaced in the k-th image, so that the face of another person is associated with it by mistake (see Figure 4a). The present method makes it possible to correct this error.
More precisely, if step (a) comprises the detection in the k-th image of at least a first element of interest, a second element of interest and a third element of interest, the first element of interest being of the first category and the second and third elements of interest being of the second category, and if the first element of interest is associated with the second element of interest, step (c) comprises depending on the result of the implementation of said combinatorial optimization algorithm:
- Either the maintenance of the association of the first element of interest with the second element (the association is confirmed, case of the pair in FIG. 4a), the association counter of the first element of interest with the second element of interest then being incremented (and the association counter of the first element of interest with the third element of interest being decremented) in step (d);
- Either the reassociation of the first element of interest with the third element of interest in place of the second element of interest (detection of the bad association and rectification, typically what happens if the association counter of the first element with the third element is higher than the association counter of the first element with the second element, case of the pair P ₂ -F ₂ in FIG. 4a), the association counter of the first element of interest with the third element d interest being then incremented (the association counter of the first element of interest with the second element of interest being decremented) in step (d).
Merging tracks
A case of “track duplication” can occur if the same entity is successively detected as two distinct elements of interest, ie referenced by two different identifiers (a track was created unnecessarily), which is often a consequence of a wrong association (the “non-associated” element of interest starts a new track). The present process will make it possible to merge these two elements by forcing an association with the "original" element, which will cause the gradual disappearance of the duplicate. We understand that in practice we do not change the assignment, but only the identifier of the duplicate.
This is allowed by the fact of considering in the optimization algorithm of step (c) non-detected elements but having an association counter with a detected element which is non-zero (ie they have already been recently associated with a visible element, but do not appear, which is suspect).
Thus, if step (a) comprises the detection in the k-th image of at least a first element of interest and a second element of interest, but not that of a third element of interest, the first element of interest being of the first category and the second element of interest and the third element of interest being of the second category, and if the first element of interest is associated with the second element of interest, the step ( c) depending on the result of the implementation of said combinatorial optimization algorithm:
- Or the “normal” maintenance of the association of the first element of interest with the second element (the association is confirmed, that is to say that the old association of the first element with the third was an error) , the counter of association of the first element of interest with the second element of interest then being incremented (the counter of association of the first element of interest with the third element of interest then being decremented) in step ( d);
- Either the reassignment of the identifier of the third element of interest to the second element of interest (we understand that the association of the first element with the second element was not false, but that the latter is one with the third element, and therefore we get its track - typically what happens if the association counter of the first element with the third element is higher than the association counter of the first element with the second element, case of the pair P ₄ / P ₁ -F ₁ in FIG. 4a), the association counter of the first element of interest with the third element of interest then being incremented (the association counter of the second element can be decremented, or even directly set to zero to delete this track) in step (d).
Computer program product
According to a second and a third aspect, the invention relates to a computer program product comprising code instructions for execution (in particular on the data processing means 11a, 11b, 11c, 21 of one or more servers 1a, 1b, 1c or of the terminal 2) of a method according to the first aspect of the invention for monitoring elements of interest visible in a video; as well as storage means readable by computer equipment (a memory 12a, 12b, 12c, 22 of one or more servers 1a, 1b, 1c or of terminal 2) on which this computer program product is found.

权利要求:
Claims (15)
[1" id="c-fr-0001]
1. A method of monitoring elements of interest visible in a video consisting of a sequence of K images, characterized in that it comprises the implementation by data processing means (21) of a terminal ( 2), steps of:
(a) Association of each element of interest of a first category visible in a k-th image of said video with an element of interest of a second category different from the first category visible in said k-th image;
(b) Calculation of a cost of associating a plurality of pairs of an element of interest of the first category visible in at least one image of the video with an element of interest of the second category visible in at at least one image of the video, as a function of at least the counters of association of the pairs of an element of interest of the first category with an element of interest of the second category;
(c) Implementation of a combinatorial optimization algorithm as a function of the association costs calculated so as to reassociate each element of interest of the first category visible in said k-th image with an element of interest of the second category visible in said kth image;
(d) Updating association counters.
[2" id="c-fr-0002]
2. Method according to claim 1, repeated iteratively for each image k e [1; d® I® video.
[3" id="c-fr-0003]
3. Method according to one of claims 1 and 2 wherein two elements of associated interests are considered to be part of the same entity.
[4" id="c-fr-0004]
4. The method of claim 3, wherein one of the first and second categories of items of interest is a sub-part of the other.
[5" id="c-fr-0005]
5. Method according to claim 4, in which either one of the first and of the second category is the face category and the other is the person category, or one of the first and of the second category is the category license plate and the other is the vehicle category or a subcategory of the vehicle category.
[6" id="c-fr-0006]
6. Method according to one of claims 1 to 5, wherein said combinatorial optimization algorithm is the Hungarian algorithm.
[7" id="c-fr-0007]
7. Method according to one of claims 1 to 6, wherein each element of interest is referenced with an identifier, two associated elements of interest being referenced with the same identifier.
[8" id="c-fr-0008]
8. Method according to one of claims 1 to 7, wherein step (a) comprises the detection of at least one element of interest of a first category visible in said image and at least one element of interest of a second category different from the first category visible in said image, by means of at least one convolutional neural network, CNN; each element of interest of the first category detected being associated with an element of interest of the second category detected.
[9" id="c-fr-0009]
9. The method of claim 8, wherein an association cost is calculated in step (b) for each pair of an element of interest of the first category detected in the k-th image with an element of interest of the second category already detected in at least one image of the video such that the association counter of said pair is not zero, and for each pair of an element of interest of the second category detected in the k -th image with an element of interest of the first category already detected in at least one image of the video such that the association counter of said pair is not zero.
[10" id="c-fr-0010]
10. Method according to one of claims 8 and 9, wherein, if step (a) comprises the detection in the k-th image of at least a first element of interest, a second element of interest and a third element of interest, the first element of interest being of the first category and the second and third elements of interest being of the second category, and if the first element of interest is associated with the second element of interest, step (c) comprises depending on the result of the implementation of said combinatorial optimization algorithm:
- Or maintaining the association of the first element of interest with the second element, the association counter of the first element of interest with the second element of interest then being incremented in step (d);
- Or the reassociation of the first element of interest with the third element of interest in place of the second element of interest, the counter for association of the first element of interest with the third element of interest then being incremented in l 'step (d).
[11" id="c-fr-0011]
11. Method according to one of claims 8 to 10, in combination with claim 7, wherein, if step (a) comprises the identification in the k-th image of at least a first element of interest and a second element of interest, but not that of a third element of interest, the first element of interest being of the first category and the third element of interest being of the second category, and if the first element d interest is associated with the second element of interest, step (c) comprises, depending on the result of the implementation of said combinatorial optimization algorithm:
- Or maintaining the association of the first element of interest with the second element, the association counter of the first element of interest with the second element of interest then being incremented in step (d);
- Or the reassignment of the identifier of the third element of interest to the second element of interest, the association counter of the first element of interest with the third element of interest then being incremented in step (d).
[12" id="c-fr-0012]
12. Method according to one of claims 10 and 11, wherein an unincremented association counter is decremented in step (d).
[13" id="c-fr-0013]
13. Method according to one of claims 1 to 12, in which the cost of association of the i-th element of interest of the first category visible in said k-th image with the j-th element of interest of the second category visible in said k-th image is obtained in step (b) by one of the following formulas. Cÿ = 1 - - _m , Cÿ = Σ; ^; ^a cu + Σί ^ ί ^ac ij> Cfj = 1 - 2j £ = o 2-d = o ^uc lj —— ² * ^aCi J * ^sdi * ^sd J ---- _{or c} = Σ ^ ί ^ ώασ ;, + yTLjSdjacij; with n the number
XÎ ^ sdiacu + X ^ oSdjacu ^l J ¹ J 'of elements of interest of the first category and m the number of elements of interest of the second category visible in said k-th image, and sc ^ respectively the scores detecting said i-th element of interest of the first category and of said j-th element of interest of the second category.
[14" id="c-fr-0014]
14. computer program product comprising code instructions for the execution of a method according to one of claims 1 to 13 for tracking elements of interest visible in a video, when said program is executed by a computer .
[15" id="c-fr-0015]
15. Storage means readable by computer equipment on which a computer program product comprises code instructions for the execution of a method according to one of claims 1 to 13 for tracking elements of interest visible in a video.

类似技术:

公开号 | 公开日 | 专利标题

EP3633544A1|2020-04-08|Method for association of elements of interest visible in a video

EP2696344B1|2019-01-02|Method and system for detecting acoustic events in a given environment

FR2974434A1|2012-10-26|PREDICTING THE AESTHETIC VALUE OF AN IMAGE

EP3633545A1|2020-04-08|Methods for learning parameters of a convolutional neural network, for detecting visible elements of interest in an image and association of elements of interest visible in an image

WO2013045593A1|2013-04-04|Improved method of checking the appearance of the surface of a tyre

Ammar et al.2019|Moving objects segmentation based on deepsphere in video surveillance

FR3026526A1|2016-04-01|METHOD AND SYSTEM FOR DETECTING EVENTS OF KNOWN NATURE

EP3639190B1|2021-12-15|Descriptor learning method for the detection and location of objects in a video

Chen et al.2020|Multispectral image fusion based pedestrian detection using a multilayer fused deconvolutional single-shot detector

FR3079329A1|2019-09-27|METHODS OF LEARNING PARAMETERS FROM A CONVOLVED NEURON NETWORK, AND CLASSIFYING AN INPUT DATA

EP2804129A1|2014-11-19|Visual speech-recognition method with selection of the most relevant groups of points of interest

Loghmani et al.2020|Positive-unlabeled learning for open set domain adaptation

EP3633552A1|2020-04-08|Methods for learning of parameters of a convolutional neural network, and detection of elements of interest visible in an image

EP3620970A1|2020-03-11|Method for extracting characteristics of a fingerprint represented by an input image

Wu et al.2018|Kinship verification using color features and extreme learning machine

FR3087558A1|2020-04-24|METHOD FOR EXTRACTING CHARACTERISTICS OF A FINGERPRINT REPRESENTED BY AN INPUT IMAGE

EP1390905B1|2008-02-27|Method for detecting text zones in a video image

FR3088467A1|2020-05-15|METHOD FOR CLASSIFYING A REPRESENTATIVE INPUT IMAGE OF A BIOMETRIC TRAIT USING A CONVOLUTIONAL NEURON NETWORK

EP2769360B1|2016-02-03|Method for locating objects by resolution in the three-dimensional space of the scene

WO2018197693A1|2018-11-01|Automated method and device capable of providing dynamic perceptive invariance of a space-time event with a view to extracting unified semantic representations therefrom

WO2019091787A1|2019-05-16|Method for estimating the installation of a camera in the reference frame of a three-dimensional scene, device, augmented reality system and associated computer program

Qu et al.2021|Non-local Representation based Mutual Affine-Transfer Network for Photorealistic Stylization

EP3712807B1|2021-12-01|Method for identifying luggage

FR3112228A1|2022-01-07|Device and method for generating a mask of the silhouette of the profile of a structure

Xu et al.2020|Learning to See in Extremely Low-Light Environments with Small Data

同族专利:

公开号 | 公开日

EP3633544A1|2020-04-08|

FR3087038B1|2021-03-19|

US20200110971A1|2020-04-09|

US11106946B2|2021-08-31|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US10055669B2|2016-08-12|2018-08-21|Qualcomm Incorporated|Methods and systems of determining a minimum blob size in video analytics|

US11004209B2|2017-10-26|2021-05-11|Qualcomm Incorporated|Methods and systems for applying complex object detection in a video analytics system|US10997450B2|2017-02-03|2021-05-04|Siemens Aktiengesellschaft|Method and apparatus for detecting objects of interest in images|

CN111695429B|2020-05-15|2022-01-11|深圳云天励飞技术股份有限公司|Video image target association method and device and terminal equipment|

法律状态:
2019-09-19| PLFP| Fee payment|Year of fee payment: 2 |

2020-04-10| PLSC| Publication of the preliminary search report|Effective date: 20200410 |

2020-09-17| PLFP| Fee payment|Year of fee payment: 3 |

2021-09-22| PLFP| Fee payment|Year of fee payment: 4 |

优先权:

申请号 | 申请日 | 专利标题

FR1859158A|FR3087038B1|2018-10-03|2018-10-03|METHOD OF MONITORING ELEMENTS OF INTEREST VISIBLE IN A VIDEO|FR1859158A| FR3087038B1|2018-10-03|2018-10-03|METHOD OF MONITORING ELEMENTS OF INTEREST VISIBLE IN A VIDEO|

US16/590,943| US11106946B2|2018-10-03|2019-10-02|Method for association of items of interest visible in a video|

EP19306257.7A| EP3633544A1|2018-10-03|2019-10-02|Method for association of elements of interest visible in a video|

[返回顶部]